In this part, we aim to apply temporal prediction models to each district/beat and see the predictions and how the residuals are distributed spatially.

## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/xiaomuliu/CrimeProject/SpatioTemporalModeling/CPDShapeFiles/", layer: "beat_bndy"
## with 288 features and 3 fields
## Feature type: wkbPolygon with 2 dimensions
## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/xiaomuliu/CrimeProject/SpatioTemporalModeling/CPDShapeFiles/", layer: "district_bndy"
## with 28 features and 3 fields
## Feature type: wkbPolygon with 2 dimensions

Some of the explanatory variables are weather variables which are retrieved from WeatherAnalytics.com by its API. All the hourly weather data are further convert to daily ones and are trimed to contain only nine variables, namely, air temperature, dew point temperature, relative humidity, surface air pressure, cloud cover, apparent (aka feels-like) temperature, precipitation, wind speed. Their one-day and two-day differences are also calculated.

Here we show a few example of district-level crime count time series and their autocorreliations

A few example of beat-level crime count time series and their autocorreliations

For each district/beat \(i\), we apply the time series regression model which is simply in the form of \[ y_i(t) = f_{Ti}(t) + f_{Ri}(X,t,y_i) + \epsilon_i(t) \] where \[ f_{Ti}(t) = \beta_0+\beta_1 t+\beta_2*sin(\frac{2\pi t}{365.25})+\beta_3*cos(\frac{2\pi t}{365.25}) \]

and \(f_R\) models the residuals. And X contains weather, one and two day weather differences and order 1 and 2 lagged variables.

\[ f_{Ri}(X) = \mathbf{X}\beta \]

The variables were selected by the LASSO in which the optimal parameter \(\lambda\) was chosen by cross validation.

For this part, we implemented a generic function which is able to let users adjust the following: 1. The explanartory variables (including one or more of the following variable sets: weather, weather difference, day of week, month, and lagged variables); 2. scaling methods (z-score, 0-1, and no standardization); 3. The approach of fitting the seasonal trend (weighted least square (WLS) or iteratively reweighted lease square (IRLS)); 4. Response type of GLM (Gaussian or Poisson).

Having done some experiments of the combination of the above options. It turned out that these do not affect the fitting accuarcy in terms of MSE very much. So here we only plot the results from using all available explanartory variables, 0-1 scaling since we have catogrical variables, fitting the trend by IRLS, and setting the response variable to be Gaussian.

After doing district-level regression, we plot the actual and predicted time series and ACF of redsiduals:

The actual and predicted time series and ACF of redsiduals for beat-level regression:

We pick up two dates (2014-08-01 and 2014-08-02) and do the spatially mapping: (1) the actual count (overlayed by the acutal crime incident locations) (2) the predicted count and (3) the prediction error, at district level.

Spatial mapping of: (1) the actual count (overlayed by the acutal crime incident locations) (2) the predicted count and (3) the prediction error, at beat level.